42 research outputs found
Humans and deep networks largely agree on which kinds of variation make object recognition harder
View-invariant object recognition is a challenging problem, which has
attracted much attention among the psychology, neuroscience, and computer
vision communities. Humans are notoriously good at it, even if some variations
are presumably more difficult to handle than others (e.g. 3D rotations). Humans
are thought to solve the problem through hierarchical processing along the
ventral stream, which progressively extracts more and more invariant visual
features. This feed-forward architecture has inspired a new generation of
bio-inspired computer vision systems called deep convolutional neural networks
(DCNN), which are currently the best algorithms for object recognition in
natural images. Here, for the first time, we systematically compared human
feed-forward vision and DCNNs at view-invariant object recognition using the
same images and controlling for both the kinds of transformation as well as
their magnitude. We used four object categories and images were rendered from
3D computer models. In total, 89 human subjects participated in 10 experiments
in which they had to discriminate between two or four categories after rapid
presentation with backward masking. We also tested two recent DCNNs on the same
tasks. We found that humans and DCNNs largely agreed on the relative
difficulties of each kind of variation: rotation in depth is by far the hardest
transformation to handle, followed by scale, then rotation in plane, and
finally position. This suggests that humans recognize objects mainly through 2D
template matching, rather than by constructing 3D object models, and that DCNNs
are not too unreasonable models of human feed-forward vision. Also, our results
show that the variation levels in rotation in depth and scale strongly modulate
both humans' and DCNNs' recognition performances. We thus argue that these
variations should be controlled in the image datasets used in vision research
Towards building a more complex view of the lateral geniculate nucleus: Recent advances in understanding its role
The lateral geniculate nucleus (LGN) has often been treated in the past as a linear filter that adds little to retinal processing of visual inputs. Here we review anatomical, neurophysiological, brain imaging, and modeling studies that have in recent years built up a much more complex view of LGN . These include effects related to nonlinear dendritic processing, cortical feedback, synchrony and oscillations across LGN populations, as well as involvement of LGN in higher level cognitive processing. Although recent studies have provided valuable insights into early visual processing including the role of LGN, a unified model of LGN responses to real-world objects has not yet been developed. In the light of recent data, we suggest that the role of LGN deserves more careful consideration in developing models of high-level visual processing
Anterior-posterior gradient in the integrated processing of forelimb movement direction and distance in macaque parietal cortex
A major issue in modern neuroscience is to understand how cell populations present multiple spatial and motor features during goal-directed movements. The direction and distance (depth) of arm movements often appear to be controlled independently during behavior, but it is unknown whether they share neural resources or not. Using information theory, singular value decomposition, and dimensionality reduction methods, we compare direction and depth effects and their convergence across three parietal areas during an arm movement task. All methods show a stronger direction effect during early movement preparation, whereas depth signals prevail during movement execution. Going from anterior to posterior sectors, we report an increased number of cells processing both signals and stronger depth effects. These findings suggest a serial direction and depth processing consistent with behavioral evidence and reveal a gradient of joint versus independent control of these features in parietal cortex that supports its role in sensorimotor transformations
A specialized face-processing model inspired by the organization of monkey face patches explains several face-specific phenomena observed in humans
Converging reports indicate that face images are processed through specialized neural networks in the brain –i.e. face patches in monkeys and the fusiform face area (FFA) in humans. These studies were designed to find out how faces are processed in visual system compared to other objects. Yet, the underlying mechanism of face processing is not completely revealed. Here, we show that a hierarchical computational model, inspired by electrophysiological evidence on face processing in primates, is able to generate representational properties similar to those observed in monkey face patches (posterior, middle and anterior patches). Since the most important goal of sensory neuroscience is linking the neural responses with behavioral outputs, we test whether the proposed model, which is designed to account for neural responses in monkey face patches, is also able to predict well-documented behavioral face phenomena observed in humans. We show that the proposed model satisfies several cognitive face effects such as: composite face effect and the idea of canonical face views. Our model provides insights about the underlying computations that transfer visual information from posterior to anterior face patches
A Stable Biologically Motivated Learning Mechanism for Visual Feature Extraction to Handle Facial Categorization
The brain mechanism of extracting visual features for recognizing various objects has consistently been a controversial issue in computational models of object recognition. To extract visual features, we introduce a new, biologically motivated model for facial categorization, which is an extension of the Hubel and Wiesel simple-to-complex cell hierarchy. To address the synaptic stability versus plasticity dilemma, we apply the Adaptive Resonance Theory (ART) for extracting informative intermediate level visual features during the learning process, which also makes this model stable against the destruction of previously learned information while learning new information. Such a mechanism has been suggested to be embedded within known laminar microcircuits of the cerebral cortex. To reveal the strength of the proposed visual feature learning mechanism, we show that when we use this mechanism in the training process of a well-known biologically motivated object recognition model (the HMAX model), it performs better than the HMAX model in face/non-face classification tasks. Furthermore, we demonstrate that our proposed mechanism is capable of following similar trends in performance as humans in a psychophysical experiment using a face versus non-face rapid categorization task
How Can Selection of Biologically Inspired Features Improve the Performance of a Robust Object Recognition Model?
Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition
Low-level contrast statistics of natural images can modulate the frequency of event-related potentials (ERP) in humans
Humans are fast and accurate in categorizing complex natural images. It is, however, unclear what features of visual information are exploited by brain to perceive the images with such speed and accuracy. It has been shown that low-level contrast statistics of natural scenes can explain the variance of amplitude of event-related potentials (ERP) in response to rapidly presented images. In this study, we investigated the effect of these statistics on frequency content of ERPs. We recorded ERPs from human subjects, while they viewed natural images each presented for 70 ms. Our results showed that Weibull contrast statistics, as a biologically plausible model, explained the variance of ERPs the best, compared to other image statistics that we assessed. Our time-frequency analysis revealed a significant correlation between these statistics and ERPs’ power within theta frequency band (~3-7 Hz). This is interesting, as theta band is believed to be involved in context updating and semantic encoding. This correlation became significant at ~110 ms after stimulus onset, and peaked at 138 ms. Our results show that not only the amplitude but also the frequency of neural responses can be modulated with low-level contrast statistics of natural images and highlights their potential role in scene perception
Dynamics of information processing in the primary visual cortex
Vision underlies innumerable behaviours and actions. Despite tremendous, dynamic changes in the inputs to the visual system, such as the variations in ambient light levels, our perception of the visual world is highly reliable. Understanding how dynamic visual information is processed by the brain under different conditions is a fundamental goal in neuroscience. Here, we aimed to characterise how features are processed by neurons in the primary visual cortex when the stimulus luminance and contrast varied. We showed that the temporal properties of feature processing depend on the luminance and contrast of the stimuli, and neurons dynamically adjust their coding strategies based on changes in stimulus properties. Our study proposes a novel approach to investigate adaptive information processing in a multidimensional stimulus space